Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 421570 |
| Missing cells | 1422431 |
| Missing cells (%) | 21.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 51.9 MiB |
| Average record size in memory | 129.0 B |
Variable types
| Numeric | 13 |
|---|---|
| Categorical | 2 |
| Boolean | 1 |
Date has a high cardinality: 143 distinct values | High cardinality |
MarkDown1 is highly correlated with MarkDown4 | High correlation |
MarkDown4 is highly correlated with MarkDown1 | High correlation |
Size is highly correlated with MarkDown5 | High correlation |
MarkDown1 is highly correlated with MarkDown4 and 1 other fields | High correlation |
MarkDown4 is highly correlated with MarkDown1 | High correlation |
MarkDown5 is highly correlated with Size and 1 other fields | High correlation |
MarkDown1 is highly correlated with MarkDown4 | High correlation |
MarkDown4 is highly correlated with MarkDown1 | High correlation |
MarkDown4 is highly correlated with MarkDown1 | High correlation |
Unemployment is highly correlated with Store and 2 other fields | High correlation |
Store is highly correlated with Unemployment and 3 other fields | High correlation |
Size is highly correlated with Store and 2 other fields | High correlation |
MarkDown1 is highly correlated with MarkDown4 | High correlation |
Fuel_Price is highly correlated with Unemployment and 1 other fields | High correlation |
CPI is highly correlated with Unemployment and 2 other fields | High correlation |
Temperature is highly correlated with Fuel_Price | High correlation |
Type is highly correlated with Store and 1 other fields | High correlation |
MarkDown1 has 270889 (64.3%) missing values | Missing |
MarkDown2 has 310322 (73.6%) missing values | Missing |
MarkDown3 has 284479 (67.5%) missing values | Missing |
MarkDown4 has 286603 (68.0%) missing values | Missing |
MarkDown5 has 270138 (64.1%) missing values | Missing |
Date is uniformly distributed | Uniform |
Reproduction
| Analysis started | 2022-06-05 15:20:07.630323 |
|---|---|
| Analysis finished | 2022-06-05 15:25:43.102260 |
| Duration | 5 minutes and 35.47 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 45 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22.20054558 |
| Minimum | 1 |
|---|---|
| Maximum | 45 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 11 |
| median | 22 |
| Q3 | 33 |
| 95-th percentile | 43 |
| Maximum | 45 |
| Range | 44 |
| Interquartile range (IQR) | 22 |
Descriptive statistics
| Standard deviation | 12.78529739 |
|---|---|
| Coefficient of variation (CV) | 0.5759001437 |
| Kurtosis | -1.146502781 |
| Mean | 22.20054558 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 0.07776250175 |
| Sum | 9359084 |
| Variance | 163.4638293 |
| Monotonicity | Increasing |
| Value | Count | Frequency (%) |
| 13 | 10474 | 2.5% |
| 10 | 10315 | 2.4% |
| 4 | 10272 | 2.4% |
| 1 | 10244 | 2.4% |
| 2 | 10238 | 2.4% |
| 24 | 10228 | 2.4% |
| 27 | 10225 | 2.4% |
| 34 | 10224 | 2.4% |
| 20 | 10214 | 2.4% |
| 6 | 10211 | 2.4% |
| Other values (35) | 318925 |
| Value | Count | Frequency (%) |
| 1 | 10244 | |
| 2 | 10238 | |
| 3 | 9036 | |
| 4 | 10272 | |
| 5 | 8999 | |
| 6 | 10211 | |
| 7 | 9762 | |
| 8 | 9895 | |
| 9 | 8867 | |
| 10 | 10315 |
| Value | Count | Frequency (%) |
| 45 | 9637 | |
| 44 | 7169 | |
| 43 | 6751 | |
| 42 | 6953 | |
| 41 | 10088 | |
| 40 | 10017 | |
| 39 | 9878 | |
| 38 | 7362 | |
| 37 | 7206 | |
| 36 | 6222 |
Dept
Real number (ℝ≥0)
| Distinct | 81 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 44.26031739 |
| Minimum | 1 |
|---|---|
| Maximum | 99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 18 |
| median | 37 |
| Q3 | 74 |
| 95-th percentile | 95 |
| Maximum | 99 |
| Range | 98 |
| Interquartile range (IQR) | 56 |
Descriptive statistics
| Standard deviation | 30.49205402 |
|---|---|
| Coefficient of variation (CV) | 0.6889253358 |
| Kurtosis | -1.215570579 |
| Mean | 44.26031739 |
| Median Absolute Deviation (MAD) | 23 |
| Skewness | 0.3582231935 |
| Sum | 18658822 |
| Variance | 929.7653581 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 6435 | 1.5% |
| 16 | 6435 | 1.5% |
| 92 | 6435 | 1.5% |
| 38 | 6435 | 1.5% |
| 40 | 6435 | 1.5% |
| 2 | 6435 | 1.5% |
| 82 | 6435 | 1.5% |
| 46 | 6435 | 1.5% |
| 95 | 6435 | 1.5% |
| 81 | 6435 | 1.5% |
| Other values (71) | 357220 |
| Value | Count | Frequency (%) |
| 1 | 6435 | |
| 2 | 6435 | |
| 3 | 6435 | |
| 4 | 6435 | |
| 5 | 6347 | |
| 6 | 5986 | |
| 7 | 6435 | |
| 8 | 6435 | |
| 9 | 6354 | |
| 10 | 6435 |
| Value | Count | Frequency (%) |
| 99 | 862 | 0.2% |
| 98 | 5836 | |
| 97 | 6278 | |
| 96 | 4854 | |
| 95 | 6435 | |
| 94 | 5685 | |
| 93 | 5913 | |
| 92 | 6435 | |
| 91 | 6435 | |
| 90 | 6435 |
| Distinct | 143 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.4 MiB |
| 2011-12-23 | 3027 |
|---|---|
| 2011-11-25 | 3021 |
| 2011-12-16 | 3013 |
| 2011-12-09 | 3010 |
| 2012-02-17 | 3007 |
| Other values (138) |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 4215700 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2010-02-05 |
|---|---|
| 2nd row | 2010-02-05 |
| 3rd row | 2010-02-05 |
| 4th row | 2010-02-05 |
| 5th row | 2010-02-05 |
Common Values
| Value | Count | Frequency (%) |
| 2011-12-23 | 3027 | 0.7% |
| 2011-11-25 | 3021 | 0.7% |
| 2011-12-16 | 3013 | 0.7% |
| 2011-12-09 | 3010 | 0.7% |
| 2012-02-17 | 3007 | 0.7% |
| 2011-12-30 | 3003 | 0.7% |
| 2012-02-10 | 3001 | 0.7% |
| 2011-12-02 | 2994 | 0.7% |
| 2012-03-02 | 2990 | 0.7% |
| 2012-10-12 | 2990 | 0.7% |
| Other values (133) | 391514 |
Length
| Value | Count | Frequency (%) |
| 2011-12-23 | 3027 | 0.7% |
| 2011-11-25 | 3021 | 0.7% |
| 2011-12-16 | 3013 | 0.7% |
| 2011-12-09 | 3010 | 0.7% |
| 2012-02-17 | 3007 | 0.7% |
| 2011-12-30 | 3003 | 0.7% |
| 2012-02-10 | 3001 | 0.7% |
| 2011-12-02 | 2994 | 0.7% |
| 2012-03-02 | 2990 | 0.7% |
| 2012-10-12 | 2990 | 0.7% |
| Other values (133) | 391514 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1098526 | |
| 1 | 899707 | |
| - | 843140 | |
| 2 | 791099 | |
| 3 | 103408 | 2.5% |
| 4 | 82539 | 2.0% |
| 6 | 82450 | 2.0% |
| 7 | 82241 | 2.0% |
| 9 | 79610 | 1.9% |
| 5 | 76564 | 1.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 3372560 | |
| Dash Punctuation | 843140 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1098526 | |
| 1 | 899707 | |
| 2 | 791099 | |
| 3 | 103408 | 3.1% |
| 4 | 82539 | 2.4% |
| 6 | 82450 | 2.4% |
| 7 | 82241 | 2.4% |
| 9 | 79610 | 2.4% |
| 5 | 76564 | 2.3% |
| 8 | 76416 | 2.3% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 843140 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 4215700 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 1098526 | |
| 1 | 899707 | |
| - | 843140 | |
| 2 | 791099 | |
| 3 | 103408 | 2.5% |
| 4 | 82539 | 2.0% |
| 6 | 82450 | 2.0% |
| 7 | 82241 | 2.0% |
| 9 | 79610 | 1.9% |
| 5 | 76564 | 1.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4215700 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1098526 | |
| 1 | 899707 | |
| - | 843140 | |
| 2 | 791099 | |
| 3 | 103408 | 2.5% |
| 4 | 82539 | 2.0% |
| 6 | 82450 | 2.0% |
| 7 | 82241 | 2.0% |
| 9 | 79610 | 1.9% |
| 5 | 76564 | 1.8% |
Weekly_Sales
Real number (ℝ)
| Distinct | 359464 |
|---|---|
| Distinct (%) | 85.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15981.25812 |
| Minimum | -4988.94 |
|---|---|
| Maximum | 693099.36 |
| Zeros | 73 |
| Zeros (%) | < 0.1% |
| Negative | 1285 |
| Negative (%) | 0.3% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | -4988.94 |
|---|---|
| 5-th percentile | 59.9745 |
| Q1 | 2079.65 |
| median | 7612.03 |
| Q3 | 20205.8525 |
| 95-th percentile | 61201.951 |
| Maximum | 693099.36 |
| Range | 698088.3 |
| Interquartile range (IQR) | 18126.2025 |
Descriptive statistics
| Standard deviation | 22711.18352 |
|---|---|
| Coefficient of variation (CV) | 1.421113616 |
| Kurtosis | 21.49128991 |
| Mean | 15981.25812 |
| Median Absolute Deviation (MAD) | 6747.645 |
| Skewness | 3.262008185 |
| Sum | 6737218987 |
| Variance | 515797856.8 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 10 | 353 | 0.1% |
| 5 | 289 | 0.1% |
| 20 | 232 | 0.1% |
| 15 | 215 | 0.1% |
| 12 | 175 | < 0.1% |
| 1 | 169 | < 0.1% |
| 10.47 | 167 | < 0.1% |
| 11.97 | 154 | < 0.1% |
| 2 | 148 | < 0.1% |
| 7 | 146 | < 0.1% |
| Other values (359454) | 419522 |
| Value | Count | Frequency (%) |
| -4988.94 | 1 | < 0.1% |
| -3924 | 1 | < 0.1% |
| -1750 | 1 | < 0.1% |
| -1699 | 1 | < 0.1% |
| -1321.48 | 1 | < 0.1% |
| -1098 | 3 | |
| -1008.96 | 1 | < 0.1% |
| -898 | 1 | < 0.1% |
| -863 | 1 | < 0.1% |
| -798 | 4 |
| Value | Count | Frequency (%) |
| 693099.36 | 1 | |
| 649770.18 | 1 | |
| 630999.19 | 1 | |
| 627962.93 | 1 | |
| 474330.1 | 1 | |
| 422306.25 | 1 | |
| 420586.57 | 1 | |
| 406988.63 | 1 | |
| 404245.03 | 1 | |
| 393705.2 | 1 |
IsHoliday
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.6 MiB |
| False | |
|---|---|
| True | 29661 |
| Value | Count | Frequency (%) |
| False | 391909 | |
| True | 29661 | 7.0% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.4 MiB |
| A | |
|---|---|
| B | |
| C |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 421570 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | A |
|---|---|
| 2nd row | A |
| 3rd row | A |
| 4th row | A |
| 5th row | A |
Common Values
| Value | Count | Frequency (%) |
| A | 215478 | |
| B | 163495 | |
| C | 42597 | 10.1% |
Length
Pie chart
| Value | Count | Frequency (%) |
| a | 215478 | |
| b | 163495 | |
| c | 42597 | 10.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| A | 215478 | |
| B | 163495 | |
| C | 42597 | 10.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 421570 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 215478 | |
| B | 163495 | |
| C | 42597 | 10.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 421570 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| A | 215478 | |
| B | 163495 | |
| C | 42597 | 10.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 421570 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| A | 215478 | |
| B | 163495 | |
| C | 42597 | 10.1% |
| Distinct | 40 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 136727.9157 |
| Minimum | 34875 |
|---|---|
| Maximum | 219622 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 34875 |
|---|---|
| 5-th percentile | 39690 |
| Q1 | 93638 |
| median | 140167 |
| Q3 | 202505 |
| 95-th percentile | 206302 |
| Maximum | 219622 |
| Range | 184747 |
| Interquartile range (IQR) | 108867 |
Descriptive statistics
| Standard deviation | 60980.58333 |
|---|---|
| Coefficient of variation (CV) | 0.4459995093 |
| Kurtosis | -1.206345903 |
| Mean | 136727.9157 |
| Median Absolute Deviation (MAD) | 62140 |
| Skewness | -0.3258497665 |
| Sum | 5.764038744 × 1010 |
| Variance | 3718631543 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 39690 | 20802 | 4.9% |
| 39910 | 20597 | 4.9% |
| 203819 | 20376 | 4.8% |
| 219622 | 10474 | 2.5% |
| 126512 | 10315 | 2.4% |
| 205863 | 10272 | 2.4% |
| 151315 | 10244 | 2.4% |
| 202307 | 10238 | 2.4% |
| 204184 | 10225 | 2.4% |
| 158114 | 10224 | 2.4% |
| Other values (30) | 287803 |
| Value | Count | Frequency (%) |
| 34875 | 8999 | |
| 37392 | 9036 | |
| 39690 | 20802 | |
| 39910 | 20597 | |
| 41062 | 6751 | 1.6% |
| 42988 | 7156 | 1.7% |
| 57197 | 9443 | |
| 70713 | 9762 | |
| 93188 | 9864 | |
| 93638 | 9455 |
| Value | Count | Frequency (%) |
| 219622 | 10474 | |
| 207499 | 10062 | |
| 206302 | 10113 | |
| 205863 | 10272 | |
| 204184 | 10225 | |
| 203819 | 20376 | |
| 203750 | 10142 | |
| 203742 | 10214 | |
| 203007 | 10202 | |
| 202505 | 10211 |
| Distinct | 3528 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 60.09005873 |
| Minimum | -2.06 |
|---|---|
| Maximum | 100.14 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 69 |
| Negative (%) | < 0.1% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | -2.06 |
|---|---|
| 5-th percentile | 27.31 |
| Q1 | 46.68 |
| median | 62.09 |
| Q3 | 74.28 |
| 95-th percentile | 87.27 |
| Maximum | 100.14 |
| Range | 102.2 |
| Interquartile range (IQR) | 27.6 |
Descriptive statistics
| Standard deviation | 18.44793115 |
|---|---|
| Coefficient of variation (CV) | 0.3070047115 |
| Kurtosis | -0.6359219778 |
| Mean | 60.09005873 |
| Median Absolute Deviation (MAD) | 13.63 |
| Skewness | -0.321404152 |
| Sum | 25332166.06 |
| Variance | 340.3261636 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 50.43 | 709 | 0.2% |
| 67.87 | 646 | 0.2% |
| 72.62 | 594 | 0.1% |
| 76.67 | 583 | 0.1% |
| 70.28 | 563 | 0.1% |
| 76.03 | 555 | 0.1% |
| 50.56 | 544 | 0.1% |
| 64.05 | 542 | 0.1% |
| 64.21 | 519 | 0.1% |
| 50.81 | 487 | 0.1% |
| Other values (3518) | 415828 |
| Value | Count | Frequency (%) |
| -2.06 | 69 | |
| 5.54 | 68 | |
| 6.23 | 69 | |
| 7.46 | 69 | |
| 9.51 | 70 | |
| 9.55 | 69 | |
| 10.09 | 66 | |
| 10.11 | 68 | |
| 10.24 | 69 | |
| 10.53 | 72 |
| Value | Count | Frequency (%) |
| 100.14 | 44 | < 0.1% |
| 100.07 | 46 | < 0.1% |
| 99.66 | 48 | < 0.1% |
| 99.22 | 185 | |
| 99.2 | 46 | < 0.1% |
| 98.43 | 43 | < 0.1% |
| 98.15 | 47 | < 0.1% |
| 97.66 | 42 | < 0.1% |
| 97.6 | 48 | < 0.1% |
| 97.18 | 187 |
| Distinct | 892 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.361026527 |
| Minimum | 2.472 |
|---|---|
| Maximum | 4.468 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 2.472 |
|---|---|
| 5-th percentile | 2.653 |
| Q1 | 2.933 |
| median | 3.452 |
| Q3 | 3.738 |
| 95-th percentile | 4.029 |
| Maximum | 4.468 |
| Range | 1.996 |
| Interquartile range (IQR) | 0.805 |
Descriptive statistics
| Standard deviation | 0.4585145371 |
|---|---|
| Coefficient of variation (CV) | 0.1364209813 |
| Kurtosis | -1.185404505 |
| Mean | 3.361026527 |
| Median Absolute Deviation (MAD) | 0.375 |
| Skewness | -0.1049014956 |
| Sum | 1416907.953 |
| Variance | 0.2102355808 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3.638 | 2548 | 0.6% |
| 3.63 | 2164 | 0.5% |
| 2.771 | 1917 | 0.5% |
| 3.891 | 1856 | 0.4% |
| 3.594 | 1796 | 0.4% |
| 3.524 | 1793 | 0.4% |
| 3.523 | 1792 | 0.4% |
| 2.72 | 1790 | 0.4% |
| 3.666 | 1778 | 0.4% |
| 2.78 | 1656 | 0.4% |
| Other values (882) | 402480 |
| Value | Count | Frequency (%) |
| 2.472 | 38 | < 0.1% |
| 2.513 | 45 | < 0.1% |
| 2.514 | 906 | |
| 2.52 | 39 | < 0.1% |
| 2.533 | 42 | < 0.1% |
| 2.539 | 37 | < 0.1% |
| 2.54 | 147 | < 0.1% |
| 2.542 | 45 | < 0.1% |
| 2.545 | 38 | < 0.1% |
| 2.548 | 902 |
| Value | Count | Frequency (%) |
| 4.468 | 368 | |
| 4.449 | 358 | |
| 4.308 | 168 | |
| 4.301 | 360 | |
| 4.294 | 363 | |
| 4.293 | 192 | |
| 4.288 | 172 | |
| 4.282 | 173 | |
| 4.277 | 357 | |
| 4.273 | 366 |
MarkDown1
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONMISSING| Distinct | 2277 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 270889 |
| Missing (%) | 64.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7246.420196 |
| Minimum | 0.27 |
|---|---|
| Maximum | 88646.76 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 0.27 |
|---|---|
| 5-th percentile | 149.19 |
| Q1 | 2240.27 |
| median | 5347.45 |
| Q3 | 9210.9 |
| 95-th percentile | 21801.35 |
| Maximum | 88646.76 |
| Range | 88646.49 |
| Interquartile range (IQR) | 6970.63 |
Descriptive statistics
| Standard deviation | 8291.221345 |
|---|---|
| Coefficient of variation (CV) | 1.144181695 |
| Kurtosis | 17.60626321 |
| Mean | 7246.420196 |
| Median Absolute Deviation (MAD) | 3430.74 |
| Skewness | 3.341844686 |
| Sum | 1091897842 |
| Variance | 68744351.4 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1.5 | 102 | < 0.1% |
| 460.73 | 102 | < 0.1% |
| 175.64 | 93 | < 0.1% |
| 1282.42 | 75 | < 0.1% |
| 9264.48 | 75 | < 0.1% |
| 686.24 | 75 | < 0.1% |
| 5924.71 | 75 | < 0.1% |
| 1483.17 | 75 | < 0.1% |
| 3124.45 | 74 | < 0.1% |
| 6809.96 | 74 | < 0.1% |
| Other values (2267) | 149861 | |
| (Missing) | 270889 |
| Value | Count | Frequency (%) |
| 0.27 | 51 | |
| 0.5 | 49 | |
| 1.5 | 102 | |
| 1.94 | 50 | |
| 2.12 | 52 | |
| 2.4 | 49 | |
| 2.42 | 50 | |
| 2.43 | 51 | |
| 2.8 | 50 | |
| 2.91 | 51 |
| Value | Count | Frequency (%) |
| 88646.76 | 68 | |
| 78124.5 | 70 | |
| 75149.79 | 73 | |
| 65021.23 | 73 | |
| 62567.6 | 66 | |
| 62172.73 | 72 | |
| 60740.64 | 70 | |
| 60394.73 | 72 | |
| 58928.52 | 72 | |
| 56917.7 | 71 |
| Distinct | 1499 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 310322 |
| Missing (%) | 73.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3334.628621 |
| Minimum | -265.76 |
|---|---|
| Maximum | 104519.54 |
| Zeros | 207 |
| Zeros (%) | < 0.1% |
| Negative | 1311 |
| Negative (%) | 0.3% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | -265.76 |
|---|---|
| 5-th percentile | 1.95 |
| Q1 | 41.6 |
| median | 192 |
| Q3 | 1926.94 |
| 95-th percentile | 16497.47 |
| Maximum | 104519.54 |
| Range | 104785.3 |
| Interquartile range (IQR) | 1885.34 |
Descriptive statistics
| Standard deviation | 9475.357325 |
|---|---|
| Coefficient of variation (CV) | 2.841503028 |
| Kurtosis | 37.58956105 |
| Mean | 3334.628621 |
| Median Absolute Deviation (MAD) | 184.73 |
| Skewness | 5.441261196 |
| Sum | 370970764.8 |
| Variance | 89782396.45 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1.91 | 539 | 0.1% |
| 3 | 493 | 0.1% |
| 0.5 | 485 | 0.1% |
| 1.5 | 471 | 0.1% |
| 4 | 367 | 0.1% |
| 6 | 365 | 0.1% |
| 7.64 | 354 | 0.1% |
| 3.82 | 353 | 0.1% |
| 19 | 345 | 0.1% |
| 5.73 | 345 | 0.1% |
| Other values (1489) | 107131 | 25.4% |
| (Missing) | 310322 |
| Value | Count | Frequency (%) |
| -265.76 | 71 | |
| -192 | 72 | |
| -20 | 72 | |
| -10.98 | 60 | |
| -10.5 | 143 | |
| -9.98 | 68 | |
| -9.94 | 62 | |
| -7.6 | 69 | |
| -7.01 | 69 | |
| -6.69 | 69 |
| Value | Count | Frequency (%) |
| 104519.54 | 72 | |
| 97740.99 | 73 | |
| 92523.94 | 73 | |
| 89121.94 | 74 | |
| 82881.16 | 73 | |
| 72413.71 | 72 | |
| 70574.85 | 71 | |
| 58804.91 | 69 | |
| 58046.41 | 71 | |
| 56106.2 | 72 |
| Distinct | 1662 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 284479 |
| Missing (%) | 67.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1439.421384 |
| Minimum | -29.1 |
|---|---|
| Maximum | 141630.61 |
| Zeros | 67 |
| Zeros (%) | < 0.1% |
| Negative | 257 |
| Negative (%) | 0.1% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | -29.1 |
|---|---|
| 5-th percentile | 0.65 |
| Q1 | 5.08 |
| median | 24.6 |
| Q3 | 103.99 |
| 95-th percentile | 1059.9 |
| Maximum | 141630.61 |
| Range | 141659.71 |
| Interquartile range (IQR) | 98.91 |
Descriptive statistics
| Standard deviation | 9623.07829 |
|---|---|
| Coefficient of variation (CV) | 6.685379553 |
| Kurtosis | 77.68777203 |
| Mean | 1439.421384 |
| Median Absolute Deviation (MAD) | 22.6 |
| Skewness | 8.399453018 |
| Sum | 197331717 |
| Variance | 92603635.78 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 754 | 0.2% |
| 6 | 710 | 0.2% |
| 2 | 660 | 0.2% |
| 1 | 611 | 0.1% |
| 0.22 | 487 | 0.1% |
| 0.5 | 463 | 0.1% |
| 0.01 | 444 | 0.1% |
| 4 | 439 | 0.1% |
| 3.2 | 379 | 0.1% |
| 1.98 | 363 | 0.1% |
| Other values (1652) | 131781 | |
| (Missing) | 284479 |
| Value | Count | Frequency (%) |
| -29.1 | 72 | < 0.1% |
| -1 | 70 | < 0.1% |
| -0.87 | 46 | < 0.1% |
| -0.2 | 69 | < 0.1% |
| 0 | 67 | < 0.1% |
| 0.01 | 444 | |
| 0.02 | 124 | < 0.1% |
| 0.04 | 241 | |
| 0.05 | 71 | < 0.1% |
| 0.06 | 205 |
| Value | Count | Frequency (%) |
| 141630.61 | 74 | |
| 109030.75 | 75 | |
| 103991.94 | 72 | |
| 101378.79 | 73 | |
| 89402.64 | 71 | |
| 88805.58 | 72 | |
| 83340.33 | 74 | |
| 83192.81 | 74 | |
| 79621.2 | 72 | |
| 77451.26 | 73 |
MarkDown4
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONMISSING| Distinct | 1944 |
|---|---|
| Distinct (%) | 1.4% |
| Missing | 286603 |
| Missing (%) | 68.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3383.168256 |
| Minimum | 0.22 |
|---|---|
| Maximum | 67474.85 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 0.22 |
|---|---|
| 5-th percentile | 28.76 |
| Q1 | 504.22 |
| median | 1481.31 |
| Q3 | 3595.04 |
| 95-th percentile | 12645.96 |
| Maximum | 67474.85 |
| Range | 67474.63 |
| Interquartile range (IQR) | 3090.82 |
Descriptive statistics
| Standard deviation | 6292.384031 |
|---|---|
| Coefficient of variation (CV) | 1.859908688 |
| Kurtosis | 29.99681491 |
| Mean | 3383.168256 |
| Median Absolute Deviation (MAD) | 1167.55 |
| Skewness | 4.847500037 |
| Sum | 456616070 |
| Variance | 39594096.79 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 9 | 280 | 0.1% |
| 4 | 200 | < 0.1% |
| 2 | 197 | < 0.1% |
| 3 | 146 | < 0.1% |
| 47 | 143 | < 0.1% |
| 67.72 | 142 | < 0.1% |
| 657.56 | 141 | < 0.1% |
| 17 | 141 | < 0.1% |
| 8 | 140 | < 0.1% |
| 1330.36 | 140 | < 0.1% |
| Other values (1934) | 133297 | |
| (Missing) | 286603 |
| Value | Count | Frequency (%) |
| 0.22 | 57 | < 0.1% |
| 0.41 | 52 | < 0.1% |
| 0.46 | 48 | < 0.1% |
| 0.78 | 52 | < 0.1% |
| 0.87 | 49 | < 0.1% |
| 0.92 | 45 | < 0.1% |
| 1.5 | 55 | < 0.1% |
| 1.88 | 48 | < 0.1% |
| 1.98 | 44 | < 0.1% |
| 2 | 197 |
| Value | Count | Frequency (%) |
| 67474.85 | 72 | |
| 57817.56 | 74 | |
| 57815.43 | 68 | |
| 53603.99 | 72 | |
| 52739.02 | 72 | |
| 48403.53 | 70 | |
| 48159.86 | 73 | |
| 48086.64 | 72 | |
| 47452.43 | 73 | |
| 46238.28 | 71 |
| Distinct | 2293 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 270138 |
| Missing (%) | 64.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4628.975079 |
| Minimum | 135.16 |
|---|---|
| Maximum | 108519.28 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 135.16 |
|---|---|
| 5-th percentile | 715.52 |
| Q1 | 1878.44 |
| median | 3359.45 |
| Q3 | 5563.8 |
| 95-th percentile | 11269.24 |
| Maximum | 108519.28 |
| Range | 108384.12 |
| Interquartile range (IQR) | 3685.36 |
Descriptive statistics
| Standard deviation | 5962.887455 |
|---|---|
| Coefficient of variation (CV) | 1.288165815 |
| Kurtosis | 107.8492655 |
| Mean | 4628.975079 |
| Median Absolute Deviation (MAD) | 1702.47 |
| Skewness | 8.169909544 |
| Sum | 700974954.2 |
| Variance | 35556026.8 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2743.18 | 136 | < 0.1% |
| 1064.56 | 120 | < 0.1% |
| 9083.54 | 75 | < 0.1% |
| 3567.03 | 75 | < 0.1% |
| 3557.67 | 75 | < 0.1% |
| 20371.02 | 75 | < 0.1% |
| 4180.29 | 75 | < 0.1% |
| 1773.53 | 74 | < 0.1% |
| 3932.94 | 74 | < 0.1% |
| 4464.45 | 74 | < 0.1% |
| Other values (2283) | 150579 | |
| (Missing) | 270138 |
| Value | Count | Frequency (%) |
| 135.16 | 65 | |
| 153.04 | 47 | |
| 153.9 | 49 | |
| 164.08 | 52 | |
| 170.64 | 69 | |
| 171.76 | 71 | |
| 180.07 | 64 | |
| 212.75 | 50 | |
| 224.86 | 50 | |
| 227.12 | 48 |
| Value | Count | Frequency (%) |
| 108519.28 | 68 | |
| 105223.11 | 70 | |
| 85851.87 | 68 | |
| 63005.58 | 69 | |
| 58068.14 | 69 | |
| 57029.78 | 68 | |
| 53212.72 | 70 | |
| 37581.27 | 70 | |
| 36430.33 | 71 | |
| 36360.42 | 72 |
| Distinct | 2145 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 171.2019468 |
| Minimum | 126.064 |
|---|---|
| Maximum | 227.2328068 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 126.064 |
|---|---|
| 5-th percentile | 126.4962581 |
| Q1 | 132.0226667 |
| median | 182.3187801 |
| Q3 | 212.4169928 |
| 95-th percentile | 221.9415576 |
| Maximum | 227.2328068 |
| Range | 101.1688068 |
| Interquartile range (IQR) | 80.3943261 |
Descriptive statistics
| Standard deviation | 39.15927562 |
|---|---|
| Coefficient of variation (CV) | 0.2287314855 |
| Kurtosis | -1.829714364 |
| Mean | 171.2019468 |
| Median Absolute Deviation (MAD) | 41.4348629 |
| Skewness | 0.08521928473 |
| Sum | 72173604.72 |
| Variance | 1533.448867 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 129.8555333 | 711 | 0.2% |
| 131.1083333 | 708 | 0.2% |
| 129.8459667 | 707 | 0.2% |
| 130.3849032 | 706 | 0.2% |
| 130.6457931 | 706 | 0.2% |
| 131.0756667 | 706 | 0.2% |
| 130.683 | 706 | 0.2% |
| 130.4546207 | 705 | 0.2% |
| 130.7196333 | 705 | 0.2% |
| 130.737871 | 704 | 0.2% |
| Other values (2135) | 414506 |
| Value | Count | Frequency (%) |
| 126.064 | 678 | |
| 126.0766452 | 679 | |
| 126.0854516 | 675 | |
| 126.0892903 | 682 | |
| 126.1019355 | 686 | |
| 126.1069032 | 681 | |
| 126.1119032 | 682 | |
| 126.114 | 687 | |
| 126.1145806 | 689 | |
| 126.1266 | 683 |
| Value | Count | Frequency (%) |
| 227.2328068 | 63 | |
| 227.214288 | 62 | |
| 227.1693919 | 63 | |
| 227.0369359 | 70 | |
| 227.0184166 | 69 | |
| 226.9873637 | 134 | |
| 226.9735448 | 69 | |
| 226.9688442 | 134 | |
| 226.9662325 | 63 | |
| 226.9239785 | 135 |
| Distinct | 349 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.960288695 |
| Minimum | 3.879 |
|---|---|
| Maximum | 14.313 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 3.879 |
|---|---|
| 5-th percentile | 5.326 |
| Q1 | 6.891 |
| median | 7.866 |
| Q3 | 8.572 |
| 95-th percentile | 12.187 |
| Maximum | 14.313 |
| Range | 10.434 |
| Interquartile range (IQR) | 1.681 |
Descriptive statistics
| Standard deviation | 1.863296038 |
|---|---|
| Coefficient of variation (CV) | 0.2340739275 |
| Kurtosis | 2.73121663 |
| Mean | 7.960288695 |
| Median Absolute Deviation (MAD) | 0.858 |
| Skewness | 1.183742568 |
| Sum | 3355818.905 |
| Variance | 3.471872127 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8.099 | 5152 | 1.2% |
| 8.163 | 3636 | 0.9% |
| 7.852 | 3614 | 0.9% |
| 7.343 | 3416 | 0.8% |
| 7.057 | 3414 | 0.8% |
| 7.931 | 3400 | 0.8% |
| 7.441 | 3397 | 0.8% |
| 6.565 | 3370 | 0.8% |
| 8.2 | 3361 | 0.8% |
| 6.891 | 3360 | 0.8% |
| Other values (339) | 385450 |
| Value | Count | Frequency (%) |
| 3.879 | 287 | 0.1% |
| 4.077 | 938 | |
| 4.125 | 1831 | |
| 4.145 | 562 | 0.1% |
| 4.156 | 1815 | |
| 4.261 | 1829 | |
| 4.308 | 935 | |
| 4.42 | 1855 | |
| 4.584 | 1988 | |
| 4.607 | 935 |
| Value | Count | Frequency (%) |
| 14.313 | 2636 | |
| 14.18 | 2423 | |
| 14.099 | 2441 | |
| 14.021 | 2263 | |
| 13.975 | 1529 | |
| 13.736 | 2464 | |
| 13.503 | 2661 | |
| 12.89 | 2491 | |
| 12.187 | 2507 | |
| 11.627 | 2502 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| Store | Dept | Date | Weekly_Sales | IsHoliday | Type | Size | Temperature | Fuel_Price | MarkDown1 | MarkDown2 | MarkDown3 | MarkDown4 | MarkDown5 | CPI | Unemployment | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 2010-02-05 | 24924.50 | False | A | 151315 | 42.31 | 2.572 | NaN | NaN | NaN | NaN | NaN | 211.096358 | 8.106 |
| 1 | 1 | 2 | 2010-02-05 | 50605.27 | False | A | 151315 | 42.31 | 2.572 | NaN | NaN | NaN | NaN | NaN | 211.096358 | 8.106 |
| 2 | 1 | 3 | 2010-02-05 | 13740.12 | False | A | 151315 | 42.31 | 2.572 | NaN | NaN | NaN | NaN | NaN | 211.096358 | 8.106 |
| 3 | 1 | 4 | 2010-02-05 | 39954.04 | False | A | 151315 | 42.31 | 2.572 | NaN | NaN | NaN | NaN | NaN | 211.096358 | 8.106 |
| 4 | 1 | 5 | 2010-02-05 | 32229.38 | False | A | 151315 | 42.31 | 2.572 | NaN | NaN | NaN | NaN | NaN | 211.096358 | 8.106 |
| 5 | 1 | 6 | 2010-02-05 | 5749.03 | False | A | 151315 | 42.31 | 2.572 | NaN | NaN | NaN | NaN | NaN | 211.096358 | 8.106 |
| 6 | 1 | 7 | 2010-02-05 | 21084.08 | False | A | 151315 | 42.31 | 2.572 | NaN | NaN | NaN | NaN | NaN | 211.096358 | 8.106 |
| 7 | 1 | 8 | 2010-02-05 | 40129.01 | False | A | 151315 | 42.31 | 2.572 | NaN | NaN | NaN | NaN | NaN | 211.096358 | 8.106 |
| 8 | 1 | 9 | 2010-02-05 | 16930.99 | False | A | 151315 | 42.31 | 2.572 | NaN | NaN | NaN | NaN | NaN | 211.096358 | 8.106 |
| 9 | 1 | 10 | 2010-02-05 | 30721.50 | False | A | 151315 | 42.31 | 2.572 | NaN | NaN | NaN | NaN | NaN | 211.096358 | 8.106 |
Last rows
| Store | Dept | Date | Weekly_Sales | IsHoliday | Type | Size | Temperature | Fuel_Price | MarkDown1 | MarkDown2 | MarkDown3 | MarkDown4 | MarkDown5 | CPI | Unemployment | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 421560 | 45 | 85 | 2012-10-26 | 1689.10 | False | B | 118221 | 58.85 | 3.882 | 4018.91 | 58.08 | 100.0 | 211.94 | 858.33 | 192.308899 | 8.667 |
| 421561 | 45 | 87 | 2012-10-26 | 8187.66 | False | B | 118221 | 58.85 | 3.882 | 4018.91 | 58.08 | 100.0 | 211.94 | 858.33 | 192.308899 | 8.667 |
| 421562 | 45 | 90 | 2012-10-26 | 25352.32 | False | B | 118221 | 58.85 | 3.882 | 4018.91 | 58.08 | 100.0 | 211.94 | 858.33 | 192.308899 | 8.667 |
| 421563 | 45 | 91 | 2012-10-26 | 16330.84 | False | B | 118221 | 58.85 | 3.882 | 4018.91 | 58.08 | 100.0 | 211.94 | 858.33 | 192.308899 | 8.667 |
| 421564 | 45 | 92 | 2012-10-26 | 54608.75 | False | B | 118221 | 58.85 | 3.882 | 4018.91 | 58.08 | 100.0 | 211.94 | 858.33 | 192.308899 | 8.667 |
| 421565 | 45 | 93 | 2012-10-26 | 2487.80 | False | B | 118221 | 58.85 | 3.882 | 4018.91 | 58.08 | 100.0 | 211.94 | 858.33 | 192.308899 | 8.667 |
| 421566 | 45 | 94 | 2012-10-26 | 5203.31 | False | B | 118221 | 58.85 | 3.882 | 4018.91 | 58.08 | 100.0 | 211.94 | 858.33 | 192.308899 | 8.667 |
| 421567 | 45 | 95 | 2012-10-26 | 56017.47 | False | B | 118221 | 58.85 | 3.882 | 4018.91 | 58.08 | 100.0 | 211.94 | 858.33 | 192.308899 | 8.667 |
| 421568 | 45 | 97 | 2012-10-26 | 6817.48 | False | B | 118221 | 58.85 | 3.882 | 4018.91 | 58.08 | 100.0 | 211.94 | 858.33 | 192.308899 | 8.667 |
| 421569 | 45 | 98 | 2012-10-26 | 1076.80 | False | B | 118221 | 58.85 | 3.882 | 4018.91 | 58.08 | 100.0 | 211.94 | 858.33 | 192.308899 | 8.667 |